Spectral Density Bootstrap

Exploration and Simulation

Here we replicate the paper’s work by bootstrapping the spectral density of the autoregressive process \[X_t = 0.5X_{t-1} - 0.6X_{t-2} + 0.3X_{t-3} - 0.4X_{t-4} + 0.2X_{t-5} + N_t\] using the given bootstrap algorithm:

set.seed(505)
x1 = arima.sim(n=256, list(ar=c(0.5, -0.6, 0.3, -0.4, 0.2)))
tsplot(x1, col=4)

mvspec(x1, col=4)

mvspec(x1, spans=c(2,2), col=4)

mvspec(x1, spans=c(2,2), log="yes", col=2)

spec = arma.spec(ar=c(0.5, -0.6, 0.3, -0.4, 0.2), main="Autoregression")

Usually, we can only create a confidence interval for our spectral density by using mvspec with the log option. The confidence interval is based on a Chi-square distribution with only 2 degress of freedom. Thus, the interval is too wide and of little use. Therefore, in this work, we will try using a bootstrap approach to create more accurate spectral density estimates.

The following step is to extract the size and frequency of the data.

xfreq = frequency(x1)
n = length(x1)
nspec = floor(n/2)
freq = seq(from = xfreq/n, by = xfreq/n, length = nspec) # extract the frequency 

First step in the algorithm is centering the data by subtracting the sample mean. However, since we will compare with the true spectral density of the autoregression later, we will skip this step and compute the periodogram and spectral density of the true data.

The second step produces the initial estimate of the periodogram and an estimated spectral density with a kernel \(K\) and a global bandwidth \(h\). The paper defines the periodogram to be \[I(\omega) = \frac{1}{2\pi n} \Big| \sum_{t=1}^n x_t e^{-it \omega} \Big|\] for \(\omega_j = 2\pi j/n, -n \leq j \leq n, -\pi \leq \omega_j\leq \pi\).
The formula is, indeed, equivalent to our book’s definition of the periodogram \[I(\omega_j) = |d(\omega_j)|^2 = \frac{1}{n} \Big| \sum_{t=1}^n x_t e^{-2\pi i \omega_j t} \Big|\] for \(j = 0,1,\dots, n-1\) since \(\omega = 2\pi j/n\), but in different domains. Thus, we will the book’s code to compute the periodogram \(I\). Everything we does after this will be in the frequency domain.

I = Mod(fft(x1)/sqrt(100))^2
I = I[1:floor(n/2)] # since I is symmetric, we will keep the first half
tsplot(freq, I, col=4)

Consequently, the paper chooses an inital bandwidth and a kernel to calculate the estimated spectral density \[\hat{f}(\omega,h) = \frac{1}{nh} \sum_{k =-n}^n K \Big(\frac{\omega-\omega_k}{h} \Big) I(\omega).\] where \(K(\cdot)\) is a given symmetric, nonnegative function on the real values. It is noted that the inital global bandwidth should not depend on \(\omega\). The paper chooses the Barlett kernel with a global bandwidth of 0.05. In our exploration, we will work with the Daniell kernel of spans c(5,5). The next cell gives a demonstration of the kernel weights and how we use the kernel to get the smoothed spectral density.

(dm = kernel("modified.daniell", c(5,5))) # for a list 
## mDaniell(5,5) 
## coef[-10] = 0.0025
## coef[ -9] = 0.0100
## coef[ -8] = 0.0200
## coef[ -7] = 0.0300
## coef[ -6] = 0.0400
## coef[ -5] = 0.0500
## coef[ -4] = 0.0600
## coef[ -3] = 0.0700
## coef[ -2] = 0.0800
## coef[ -1] = 0.0900
## coef[  0] = 0.0950
## coef[  1] = 0.0900
## coef[  2] = 0.0800
## coef[  3] = 0.0700
## coef[  4] = 0.0600
## coef[  5] = 0.0500
## coef[  6] = 0.0400
## coef[  7] = 0.0300
## coef[  8] = 0.0200
## coef[  9] = 0.0100
## coef[ 10] = 0.0025
par(mfrow=1:2)
plot(kernel("daniell", 5), ylab=expression(h[~k]), col=4, panel.first=Grid(nxm=5))
plot(dm, ylab=expression(h[~k]), panel.first=Grid(), col=4) # for a plot 

h = c(5,5)  # initial bandwidth 
kd = kernel("daniell", h) 
f.hat = kernapply(I, kd, circular=TRUE)
tsplot(freq, f.hat, col=4)

Our next step is to estimate the residuals \[\hat{\epsilon} = \frac{I(\omega_k)}{\hat{f}(\omega_k,h_i)},\] for \(k = 1, \dots, n\). This results from the interpretation of the spectral estimator as the approximate multiplicative regression \[I(\omega_k) = \hat{f}(\omega_k) \cdot \epsilon_k.\] Then the paper recenters the data to avoid an additional bias in the resampling stage by rescaling the residuals \[\tilde{e}_k = \frac{\hat{\epsilon_k}}{\epsilon}, \text{ for } k=1, \dots, n, \text{ and } \epsilon = \frac{1}{n} \sum_{k=1}^n \hat{\epsilon}_k.\]

eps = I / f.hat
scaled_eps = eps / mean(eps)
mean(scaled_eps)
## [1] 1

The mean of the scaled residuals should be 1.

Now resample the scaled residuals and run boostrap estimate for a large number of times. The resampled periodogram is computed as \[I^*(\omega_k) = I^*(-\omega_k) = \hat{f}(w_k,g) \tilde{\epsilon_k}^*\] where \(g\) is a resampling bandwidth and \(\{\epsilon_k^*\}\) is a resample of the scaled residuals. We will pick \(g\) to be the same as the global bandwidth. Then the bootstrap kernel spectral density is calculated as \[\hat{f}(\omega,h,g) = \frac{1}{nh}\sum_{k=-n}^{n}K \Big(\frac{\omega-\omega_k}{h} \Big) I^*(\omega_k)\]

nboot = 1000
f.star.dist = c(c())
for (i in 1:nboot)
{
  scaled_eps.star = sample(scaled_eps, replace=TRUE)  # resample the scaled residuals 
  # new kernel 
  g = sample(1:5, 1)
  kd.star = kernel("daniell", c(g,g))
  # another estimated spectral density with the new kernel
  f.hat = kernapply(I, kd.star, circular = TRUE)
  # resampled periodogram
  I.star = f.hat * scaled_eps.star
  # bootstrap kernel spectral density
  f.star = kernapply(I.star, kd, circular=TRUE)
  f.star.dist[i] = list(f.star)
}

lwr = c()
upr = c()
for (i in 1:nspec)
{
  f.star.wi = c()
  for (j in 1:nboot)
  {
    f.star.wi[j] = f.star.dist[[j]][i]
  }
  # the 2.5 percentile spectral density at all frequencies
  lwr[i] = quantile(f.star.wi, probs=c(.25, .75))[1]
  # the 97.5 percentile spectral density at all frequencies
  upr[i] = quantile(f.star.wi, probs=c(.25, .75))[2]
}
arma.spec(ar=c(0.5, -0.6, 0.3, -0.4, 0.2), main="Autoregression", ylim=c(0, max(upr,max(spec$spec))))

tsplot(freq, I, col=4)
lines(freq, upr, col=2)
lines(freq, lwr, col=2)

Build an R Package

From the previous exploration, I build my first R package to bootstrap spectral density based on resampling the periodogram. You can find the complete package here: https://github.com/uyenle-gh/spec.boots

The following chunk installs the package into your environment.

Now we will apply our spec.boots function to produce a confidence interval for the spectral density of the autoregression:

kd = kernel("daniell", c(5,5))
spec = spec.boots(x1, 1000, kd, demean=FALSE, alpha=0.95)

Most of the power is concentrated from frequency 0.13 to frequency 0.23, where the 95% confidence interval is above the median baseline. This allows us to conclude that the peak we observe around frequency 0.18 is different from the baseline and is statistically significant.

Another example where we know the exact frequencies is demonstrated below:

y1 = 2*cos(2*pi*1:100*6/100) + 3*sin(2*pi*1:100*6/100)
y2 = 4*cos(2*pi*1:100*10/100) + 5*sin(2*pi*1:100*10/100)
y3 = 6*cos(2*pi*1:100*40/100) + 7*sin(2*pi*1:100*40/100)
y = y1 + y2 + y3
y = ts(y)
spec.boots(y, 1000, kernel("daniell", c(2,2)))

##    freq          lwr          upr
## 1  0.01  0.270223239  17.45207554
## 2  0.02  0.425149841  29.17483180
## 3  0.03  3.613353560  50.40524749
## 4  0.04  6.684144499  84.74913091
## 5  0.05 10.228294102 123.99912881
## 6  0.06 14.113791148 168.73174892
## 7  0.07 20.600388027 241.37377739
## 8  0.08 26.961871032 289.92339361
## 9  0.09 26.730406242 341.70273120
## 10 0.10 29.632080473 374.96964289
## 11 0.11 32.398298039 398.79006550
## 12 0.12 31.765245861 349.16376089
## 13 0.13 24.845023810 295.33638276
## 14 0.14 17.668298140 219.33958502
## 15 0.15 10.316856806 146.58849611
## 16 0.16  0.175864848  85.43525666
## 17 0.17  0.034660813  47.58141656
## 18 0.18  0.010259467  23.60354769
## 19 0.19  0.006406585   9.76800883
## 20 0.20  0.003795360   0.03046949
## 21 0.21  0.003587928   0.02883219
## 22 0.22  0.002863197   0.02676836
## 23 0.23  0.002688253   0.02466545
## 24 0.24  0.002506663   0.02304150
## 25 0.25  0.002309986   0.02119643
## 26 0.26  0.002289392   0.01988411
## 27 0.27  0.002019022   0.01818709
## 28 0.28  0.001860394   0.01632454
## 29 0.29  0.001854202   0.01600192
## 30 0.30  0.001750333   0.01493799
## 31 0.31  0.001617986   0.01457835
## 32 0.32  0.001594527   0.01409162
## 33 0.33  0.002667105  20.36772144
## 34 0.34  0.005072163  49.24809472
## 35 0.35  0.013794963  99.34851270
## 36 0.36  0.180623270 193.68520762
## 37 0.37 21.190621572 311.86491605
## 38 0.38 18.517373620 428.97680477
## 39 0.39 35.057150417 549.14836718
## 40 0.40 44.195429756 666.87758093
## 41 0.41 54.207926080 737.32322266
## 42 0.42 50.298037454 684.59572880
## 43 0.43 46.306856477 587.67380059
## 44 0.44 33.288276679 451.25448922
## 45 0.45 21.456315942 300.87576557
## 46 0.46  1.426250090 189.53943603
## 47 0.47  0.194874613 102.91956909
## 48 0.48  0.167327934  51.76333907
## 49 0.49  0.152880313  21.09192861
## 50 0.50  0.128516663   8.53530200

The spectral density plot of this example shows two peaks around frequency 0.1 and 0.4. The confidence interval corresponding to frequency 0.1 is slightly above the median baseline, so we suspect that this frequency only has a small effect.

Applications

In the next few chunks, we demonstrate how to use spec.boots with real data and compare the result with those produced by the mvspec command.

data("AirPassengers")
tsplot(AirPassengers, col=4, lwd=2)

mvspec(AirPassengers, col=4, lwd=2)

mvspec(AirPassengers, log='y', col=4, lwd=2)

mvspec(AirPassengers, log='y', spans=c(4,4), col=4, lwd=2)

spec.boots(AirPassengers, 1000, c(5,5))

##          freq        lwr        upr
## 1  0.08333333  355.72568  4142.2920
## 2  0.16666667  420.54207  4576.5138
## 3  0.25000000  446.45384  4731.9612
## 4  0.33333333  422.74234  4440.5706
## 5  0.41666667  374.69175  3664.4454
## 6  0.50000000  370.13121  3068.4375
## 7  0.58333333  439.38391  3310.1799
## 8  0.66666667  603.41517  6251.2021
## 9  0.75000000 1010.98026 11819.4380
## 10 0.83333333 1672.14100 18909.4257
## 11 0.91666667 2470.73436 27376.6577
## 12 1.00000000 3079.55389 35076.7355
## 13 1.08333333 3237.30457 38125.7041
## 14 1.16666667 3126.18607 34922.3982
## 15 1.25000000 2541.43597 28261.3938
## 16 1.33333333 1819.00055 20618.4469
## 17 1.41666667 1149.11314 12820.0530
## 18 1.50000000  680.91683  7033.2147
## 19 1.58333333  470.55867  3831.4547
## 20 1.66666667  399.48993  3018.9933
## 21 1.75000000  473.26118  4337.7881
## 22 1.83333333  630.24126  6348.0246
## 23 1.91666667  851.76192  8489.9060
## 24 2.00000000 1096.04818 10263.8790
## 25 2.08333333 1064.68364 11140.4275
## 26 2.16666667  948.72250 10389.6724
## 27 2.25000000  676.99831  8208.0631
## 28 2.33333333  452.50726  5646.9596
## 29 2.41666667  261.12091  3266.7342
## 30 2.50000000  153.35273  1676.9989
## 31 2.58333333   89.47945   794.7784
## 32 2.66666667   69.14466   491.7707
## 33 2.75000000   69.27338   691.4979
## 34 2.83333333   95.04063  1011.3973
## 35 2.91666667  119.18149  1306.3992
## 36 3.00000000  140.21235  1499.2165
## 37 3.08333333  146.98432  1523.9207
## 38 3.16666667  133.40988  1400.2227
## 39 3.25000000  100.98750  1084.9823
## 40 3.33333333   70.90411   786.7359
## 41 3.41666667   43.53271   476.4370
## 42 3.50000000   31.06174   271.6371
## 43 3.58333333   28.12104   197.0744
## 44 3.66666667   33.69321   279.0043
## 45 3.75000000   48.24582   456.2280
## 46 3.83333333   67.65395   678.4795
## 47 3.91666667   90.14435   877.9652
## 48 4.00000000  111.81482  1113.2969
## 49 4.08333333  115.11076  1148.1218
## 50 4.16666667  104.20064  1066.9908
## 51 4.25000000   84.77034   877.5241
## 52 4.33333333   56.60093   665.5390
## 53 4.41666667   38.67698   434.6461
## 54 4.50000000   26.24454   254.1069
## 55 4.58333333   21.21035   157.6562
## 56 4.66666667   21.15284   174.4998
## 57 4.75000000   26.50933   271.4079
## 58 4.83333333   37.66147   424.9991
## 59 4.91666667   49.12580   557.1178
## 60 5.00000000   59.07278   665.9334
## 61 5.08333333   66.56840   710.2042
## 62 5.16666667   67.42700   700.9098
## 63 5.25000000   57.33906   621.9711
## 64 5.33333333   43.68773   484.2052
## 65 5.41666667   31.25137   300.4933
## 66 5.50000000   23.20855   186.3925
## 67 5.58333333   19.79099   133.5227
## 68 5.66666667   26.31766   238.2488
## 69 5.75000000   44.71610   579.7101
## 70 5.83333333   79.27307  1225.4516
## 71 5.91666667  148.83136  2084.3958
## 72 6.00000000  264.50198  3187.1431

For the AirPassengers dataset, it seems like both the mvspec and spec.boots functions agree that frequencies 1 and 2 are dominant. However, it is way more obvious looking at the plot of spec.boots.

data(Inflation)
tsplot(Inflation$CPIPctDiff, col=4, lwd=2)

mvspec(Inflation$CPIPctDiff, col=4, lwd=2)

mvspec(Inflation$CPIPctDiff, col=4, lwd=2, log="y")

mvspec(Inflation$CPIPctDiff, col=4, lwd=2, log="y", spans=c(4,4))

kd = kernel("daniell", c(5,5))
spec.boots(ts(Inflation$CPIPctDiff) , 1000, kd)

##          freq        lwr        upr
## 1  0.01041667 0.04689408 0.16034337
## 2  0.02083333 0.05096311 0.17774595
## 3  0.03125000 0.05652662 0.19706106
## 4  0.04166667 0.06206656 0.21246629
## 5  0.05208333 0.06663648 0.22782192
## 6  0.06250000 0.07400268 0.24538039
## 7  0.07291667 0.07630532 0.25816234
## 8  0.08333333 0.08003779 0.27015910
## 9  0.09375000 0.08309988 0.27978048
## 10 0.10416667 0.08486199 0.28719829
## 11 0.11458333 0.08589066 0.29140894
## 12 0.12500000 0.08535325 0.29285489
## 13 0.13541667 0.08409070 0.28472460
## 14 0.14583333 0.08288236 0.28962627
## 15 0.15625000 0.08092890 0.28004080
## 16 0.16666667 0.07730652 0.27240448
## 17 0.17708333 0.07334972 0.25616363
## 18 0.18750000 0.06993991 0.23825446
## 19 0.19791667 0.06530931 0.22159049
## 20 0.20833333 0.06127102 0.20079512
## 21 0.21875000 0.05714619 0.18851591
## 22 0.22916667 0.05298595 0.17638655
## 23 0.23958333 0.04782545 0.16248253
## 24 0.25000000 0.04432116 0.14670200
## 25 0.26041667 0.03973045 0.13551853
## 26 0.27083333 0.03631679 0.11971745
## 27 0.28125000 0.03261030 0.10677109
## 28 0.29166667 0.02990924 0.09433415
## 29 0.30208333 0.02735213 0.08360982
## 30 0.31250000 0.02451812 0.07349763
## 31 0.32291667 0.02169980 0.06683070
## 32 0.33333333 0.02037498 0.06154113
## 33 0.34375000 0.01940105 0.05533246
## 34 0.35416667 0.01847060 0.05282994
## 35 0.36458333 0.01800448 0.05127177
## 36 0.37500000 0.01742394 0.05231938
## 37 0.38541667 0.01752132 0.05288002
## 38 0.39583333 0.01759420 0.05403791
## 39 0.40625000 0.01831707 0.05653643
## 40 0.41666667 0.01995307 0.06058135
## 41 0.42708333 0.02096501 0.06471582
## 42 0.43750000 0.02321579 0.06923398
## 43 0.44791667 0.02560699 0.07686749
## 44 0.45833333 0.02791782 0.08584259
## 45 0.46875000 0.03119321 0.09813570
## 46 0.47916667 0.03505789 0.11217101
## 47 0.48958333 0.03859909 0.12435642
## 48 0.50000000 0.04287094 0.14097112

The spectral density of the CPI returns peaks around frequency 0.1 for both datasets. Here, it is easier to tell which frequencies matter since we have a baseline to rely on. One drawback to this approach, as you can also tell from the two plots of spec.boots and mvspec, is that some of the peaks are flattened and spread out to others due to averaging.